-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[Bugfix] [Numpy] Add kAddTo
and kNullOp to Transpose
#16979
Conversation
Check for repeated axes enable addto to transpose fix fix fix fix remove unused ndim Update pseudo2DTranspose_op-inl.cuh Update pseudo2DTranspose_op-inl.cuh Update pseudo2DTranspose_op-inl.cuh fix Update pseudo2DTranspose_op-inl.cuh try to fix Update pseudo2DTranspose_op-inl.cuh Update pseudo2DTranspose_op-inl.cuh Update pseudo2DTranspose_op-inl.cuh fix Update np_matrix_op.cc Update test_numpy_op.py update test case fix implementation fix bug update fix bug Update pseudo2DTranspose_op-inl.cuh fix fix Update test_numpy_op.py
CHECK_EQ(inputs.size(), 1U); | ||
CHECK_EQ(outputs.size(), 1U); | ||
|
||
if (SupportMKLDNNTranspose(param, inputs[0])) { | ||
if (SupportMKLDNNTranspose(param, inputs[0]) && req[0] == kWriteTo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
|
||
template <typename DType, typename CType> | ||
/*! | ||
* \brief The `transpose_pseudo2D` based on chosen vectorized types. It transpose an array of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ptrendx I've also added the doc here for the transpose_pseudo2D
. Correct me if you find any problem.
#pragma unroll | ||
for (index_t i = 0; i < TSR; i++) { | ||
DType* tmp_dptr = reinterpret_cast<DType*>(&tmp[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I'm not sure if this will still be handled properly by the compiler, let me test that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you find any problem? I can compile that and it also passed CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I only now had a chance to look into it. No, there is not any problem - I was worried that the compiler could get confused by this and put the tmp
array in local memory instead of registers, but I tested and it does not do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thanks for clarifying 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
src/operator/tensor/matrix_op-inl.h
Outdated
} else { | ||
Copy(out, in, s); | ||
LOG(FATAL) << "Not Implemented. We should never reach here. Report an issue in Github."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we improve the error message by pointing out the case should have been handled by transpose_pseudo2D
?
* update Check for repeated axes enable addto to transpose fix fix fix fix remove unused ndim Update pseudo2DTranspose_op-inl.cuh Update pseudo2DTranspose_op-inl.cuh Update pseudo2DTranspose_op-inl.cuh fix Update pseudo2DTranspose_op-inl.cuh try to fix Update pseudo2DTranspose_op-inl.cuh Update pseudo2DTranspose_op-inl.cuh Update pseudo2DTranspose_op-inl.cuh fix Update np_matrix_op.cc Update test_numpy_op.py update test case fix implementation fix bug update fix bug Update pseudo2DTranspose_op-inl.cuh fix fix Update test_numpy_op.py * Fix bug * fix docstring * try to address comment * no need to change this line * Fix bug * address comments * address comment
* Fix ndarray indexing bug (#16895) * Fix indexing bug * More test cases * Add test from 16647 * [Gluon] Update contrib.Estimator LoggingHandler to support logging per batch interval (#16922) * Update LoggingHandler to support logging per interval * Fix the constant variable issue in the logging handler * Remove the constant variable hack in the logging handler. * 1) replace LOG_PER_BATCH with LOG_PER_INTERVAL 2) add test case * Improve the test script for LoggingHandler * small fix on the test script * logging handler test case bug fix * remove parameter verbose from LoggingHandler * move log_interval to the first argument * resolve unittest mistakes * Add micro averaging strategy to pearsonr metric (#16878) Strategy to be used for aggregating across mini-batches. "macro": average the pearsonr scores for each batch. "micro": compute a single pearsonr score across all batches. * [Bugfix] [Numpy] Add `kAddTo` and kNullOp to Transpose (#16979) * update Check for repeated axes enable addto to transpose fix fix fix fix remove unused ndim Update pseudo2DTranspose_op-inl.cuh Update pseudo2DTranspose_op-inl.cuh Update pseudo2DTranspose_op-inl.cuh fix Update pseudo2DTranspose_op-inl.cuh try to fix Update pseudo2DTranspose_op-inl.cuh Update pseudo2DTranspose_op-inl.cuh Update pseudo2DTranspose_op-inl.cuh fix Update np_matrix_op.cc Update test_numpy_op.py update test case fix implementation fix bug update fix bug Update pseudo2DTranspose_op-inl.cuh fix fix Update test_numpy_op.py * Fix bug * fix docstring * try to address comment * no need to change this line * Fix bug * address comments * address comment * introduce gradient update handler to the base estimator (#16900) * introduce gradient update handler to the base estimator * Modify the gradient update handler to include the batch size * Remove unrelated gradient update handler. * Modify gradient update handler to take the current batch size. * Remove white space to avoid the sanity check failure * add small tweak to the handler code * Modify the documentation of priority parameter of relevant handlers. * small modification on the documentation. * Add small modification on the documentation. * Remove unnecessary list check
mshadow::Tensor<xpu, 2, DType> in = src.FlatTo2D<xpu, DType>(s); | ||
mshadow::Tensor<xpu, 2, DType> out = ret.FlatTo2D<xpu, DType>(s); | ||
|
||
if (axes[0] == 1 && axes[1] == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was this if case removed?
mx.nd.transpose(a,axes=(0,1)) is basically No Transpose (and it is handled in this scenario)
Realized you moved the Copy function above and called it IdentityTranspose
Makes sense now
if (ctx.get_ctx().dev_mask() == cpu::kDevMask) { | ||
Transpose2D<DType>(in.dptr_, out.dptr_, in.shape_[0], in.shape_[1]); | ||
} else { | ||
out = in.T(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR for 2D transpose with GPU is still WIP #16706
Till then we default to the mshadow expression template based implementation of Transpose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChaiBapchya We have already had transpose_pseudo2D
, which covers the case of 2D Transpose in GPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right..
@sxjscience Is it possible to create a case that Transpose has req kAddTo? I'm considering how to support that in MKL-DNN backend. Thanks! |
@TaoLv just transpose the weight before using it (e.g. take nchw weight, transpose and use in nhwc convolution), and set grad_req as 'add' in simple_bind (gluon example would be pretty similar). |
Description
Previously, we can run the following code without any problem:
Now, we will raise an MXNetError
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes